Advance Analytics with R (UG 21-24)
I am Ayush.
I am a researcher working at the intersection of data, law, development and economics.
I teach Data Science using R at Gokhale Institute of Politics and Economics
I am a RStudio (Posit) certified tidyverse Instructor.
I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.
Reach me
ayush.ap58@gmail.com
ayush.patel@gipe.ac.in
Learn to apply and interpret simple and multiple linear regression models.
References for this lecture:
| ...1 | TV | radio | newspaper | sales |
|---|---|---|---|---|
| 1 | 230.1 | 37.8 | 69.2 | 22.1 |
| 2 | 44.5 | 39.3 | 45.1 | 10.4 |
| 3 | 17.2 | 45.9 | 69.3 | 9.3 |
| 4 | 151.5 | 41.3 | 58.5 | 18.5 |
| 5 | 180.8 | 10.8 | 58.4 | 12.9 |
| 6 | 8.7 | 48.9 | 75.0 | 7.2 |
| 7 | 57.5 | 32.8 | 23.5 | 11.8 |
| 8 | 120.2 | 19.6 | 11.6 | 13.2 |
| 9 | 8.6 | 2.1 | 1.0 | 4.8 |
| 10 | 199.8 | 2.6 | 21.2 | 10.6 |
| 11 | 66.1 | 5.8 | 24.2 | 8.6 |
| 12 | 214.7 | 24.0 | 4.0 | 17.4 |
| 13 | 23.8 | 35.1 | 65.9 | 9.2 |
| 14 | 97.5 | 7.6 | 7.2 | 9.7 |
| 15 | 204.1 | 32.9 | 46.0 | 19.0 |
| 16 | 195.4 | 47.7 | 52.9 | 22.4 |
| 17 | 67.8 | 36.6 | 114.0 | 12.5 |
| 18 | 281.4 | 39.6 | 55.8 | 24.4 |
| 19 | 69.2 | 20.5 | 18.3 | 11.3 |
| 20 | 147.3 | 23.9 | 19.1 | 14.6 |
| 21 | 218.4 | 27.7 | 53.4 | 18.0 |
| 22 | 237.4 | 5.1 | 23.5 | 12.5 |
| 23 | 13.2 | 15.9 | 49.6 | 5.6 |
| 24 | 228.3 | 16.9 | 26.2 | 15.5 |
| 25 | 62.3 | 12.6 | 18.3 | 9.7 |
| 26 | 262.9 | 3.5 | 19.5 | 12.0 |
| 27 | 142.9 | 29.3 | 12.6 | 15.0 |
| 28 | 240.1 | 16.7 | 22.9 | 15.9 |
| 29 | 248.8 | 27.1 | 22.9 | 18.9 |
| 30 | 70.6 | 16.0 | 40.8 | 10.5 |
| 31 | 292.9 | 28.3 | 43.2 | 21.4 |
| 32 | 112.9 | 17.4 | 38.6 | 11.9 |
| 33 | 97.2 | 1.5 | 30.0 | 9.6 |
| 34 | 265.6 | 20.0 | 0.3 | 17.4 |
| 35 | 95.7 | 1.4 | 7.4 | 9.5 |
| 36 | 290.7 | 4.1 | 8.5 | 12.8 |
| 37 | 266.9 | 43.8 | 5.0 | 25.4 |
| 38 | 74.7 | 49.4 | 45.7 | 14.7 |
| 39 | 43.1 | 26.7 | 35.1 | 10.1 |
| 40 | 228.0 | 37.7 | 32.0 | 21.5 |
| 41 | 202.5 | 22.3 | 31.6 | 16.6 |
| 42 | 177.0 | 33.4 | 38.7 | 17.1 |
| 43 | 293.6 | 27.7 | 1.8 | 20.7 |
| 44 | 206.9 | 8.4 | 26.4 | 12.9 |
| 45 | 25.1 | 25.7 | 43.3 | 8.5 |
| 46 | 175.1 | 22.5 | 31.5 | 14.9 |
| 47 | 89.7 | 9.9 | 35.7 | 10.6 |
| 48 | 239.9 | 41.5 | 18.5 | 23.2 |
| 49 | 227.2 | 15.8 | 49.9 | 14.8 |
| 50 | 66.9 | 11.7 | 36.8 | 9.7 |
| 51 | 199.8 | 3.1 | 34.6 | 11.4 |
| 52 | 100.4 | 9.6 | 3.6 | 10.7 |
| 53 | 216.4 | 41.7 | 39.6 | 22.6 |
| 54 | 182.6 | 46.2 | 58.7 | 21.2 |
| 55 | 262.7 | 28.8 | 15.9 | 20.2 |
| 56 | 198.9 | 49.4 | 60.0 | 23.7 |
| 57 | 7.3 | 28.1 | 41.4 | 5.5 |
| 58 | 136.2 | 19.2 | 16.6 | 13.2 |
| 59 | 210.8 | 49.6 | 37.7 | 23.8 |
| 60 | 210.7 | 29.5 | 9.3 | 18.4 |
| 61 | 53.5 | 2.0 | 21.4 | 8.1 |
| 62 | 261.3 | 42.7 | 54.7 | 24.2 |
| 63 | 239.3 | 15.5 | 27.3 | 15.7 |
| 64 | 102.7 | 29.6 | 8.4 | 14.0 |
| 65 | 131.1 | 42.8 | 28.9 | 18.0 |
| 66 | 69.0 | 9.3 | 0.9 | 9.3 |
| 67 | 31.5 | 24.6 | 2.2 | 9.5 |
| 68 | 139.3 | 14.5 | 10.2 | 13.4 |
| 69 | 237.4 | 27.5 | 11.0 | 18.9 |
| 70 | 216.8 | 43.9 | 27.2 | 22.3 |
| 71 | 199.1 | 30.6 | 38.7 | 18.3 |
| 72 | 109.8 | 14.3 | 31.7 | 12.4 |
| 73 | 26.8 | 33.0 | 19.3 | 8.8 |
| 74 | 129.4 | 5.7 | 31.3 | 11.0 |
| 75 | 213.4 | 24.6 | 13.1 | 17.0 |
| 76 | 16.9 | 43.7 | 89.4 | 8.7 |
| 77 | 27.5 | 1.6 | 20.7 | 6.9 |
| 78 | 120.5 | 28.5 | 14.2 | 14.2 |
| 79 | 5.4 | 29.9 | 9.4 | 5.3 |
| 80 | 116.0 | 7.7 | 23.1 | 11.0 |
| 81 | 76.4 | 26.7 | 22.3 | 11.8 |
| 82 | 239.8 | 4.1 | 36.9 | 12.3 |
| 83 | 75.3 | 20.3 | 32.5 | 11.3 |
| 84 | 68.4 | 44.5 | 35.6 | 13.6 |
| 85 | 213.5 | 43.0 | 33.8 | 21.7 |
| 86 | 193.2 | 18.4 | 65.7 | 15.2 |
| 87 | 76.3 | 27.5 | 16.0 | 12.0 |
| 88 | 110.7 | 40.6 | 63.2 | 16.0 |
| 89 | 88.3 | 25.5 | 73.4 | 12.9 |
| 90 | 109.8 | 47.8 | 51.4 | 16.7 |
| 91 | 134.3 | 4.9 | 9.3 | 11.2 |
| 92 | 28.6 | 1.5 | 33.0 | 7.3 |
| 93 | 217.7 | 33.5 | 59.0 | 19.4 |
| 94 | 250.9 | 36.5 | 72.3 | 22.2 |
| 95 | 107.4 | 14.0 | 10.9 | 11.5 |
| 96 | 163.3 | 31.6 | 52.9 | 16.9 |
| 97 | 197.6 | 3.5 | 5.9 | 11.7 |
| 98 | 184.9 | 21.0 | 22.0 | 15.5 |
| 99 | 289.7 | 42.3 | 51.2 | 25.4 |
| 100 | 135.2 | 41.7 | 45.9 | 17.2 |
| 101 | 222.4 | 4.3 | 49.8 | 11.7 |
| 102 | 296.4 | 36.3 | 100.9 | 23.8 |
| 103 | 280.2 | 10.1 | 21.4 | 14.8 |
| 104 | 187.9 | 17.2 | 17.9 | 14.7 |
| 105 | 238.2 | 34.3 | 5.3 | 20.7 |
| 106 | 137.9 | 46.4 | 59.0 | 19.2 |
| 107 | 25.0 | 11.0 | 29.7 | 7.2 |
| 108 | 90.4 | 0.3 | 23.2 | 8.7 |
| 109 | 13.1 | 0.4 | 25.6 | 5.3 |
| 110 | 255.4 | 26.9 | 5.5 | 19.8 |
| 111 | 225.8 | 8.2 | 56.5 | 13.4 |
| 112 | 241.7 | 38.0 | 23.2 | 21.8 |
| 113 | 175.7 | 15.4 | 2.4 | 14.1 |
| 114 | 209.6 | 20.6 | 10.7 | 15.9 |
| 115 | 78.2 | 46.8 | 34.5 | 14.6 |
| 116 | 75.1 | 35.0 | 52.7 | 12.6 |
| 117 | 139.2 | 14.3 | 25.6 | 12.2 |
| 118 | 76.4 | 0.8 | 14.8 | 9.4 |
| 119 | 125.7 | 36.9 | 79.2 | 15.9 |
| 120 | 19.4 | 16.0 | 22.3 | 6.6 |
| 121 | 141.3 | 26.8 | 46.2 | 15.5 |
| 122 | 18.8 | 21.7 | 50.4 | 7.0 |
| 123 | 224.0 | 2.4 | 15.6 | 11.6 |
| 124 | 123.1 | 34.6 | 12.4 | 15.2 |
| 125 | 229.5 | 32.3 | 74.2 | 19.7 |
| 126 | 87.2 | 11.8 | 25.9 | 10.6 |
| 127 | 7.8 | 38.9 | 50.6 | 6.6 |
| 128 | 80.2 | 0.0 | 9.2 | 8.8 |
| 129 | 220.3 | 49.0 | 3.2 | 24.7 |
| 130 | 59.6 | 12.0 | 43.1 | 9.7 |
| 131 | 0.7 | 39.6 | 8.7 | 1.6 |
| 132 | 265.2 | 2.9 | 43.0 | 12.7 |
| 133 | 8.4 | 27.2 | 2.1 | 5.7 |
| 134 | 219.8 | 33.5 | 45.1 | 19.6 |
| 135 | 36.9 | 38.6 | 65.6 | 10.8 |
| 136 | 48.3 | 47.0 | 8.5 | 11.6 |
| 137 | 25.6 | 39.0 | 9.3 | 9.5 |
| 138 | 273.7 | 28.9 | 59.7 | 20.8 |
| 139 | 43.0 | 25.9 | 20.5 | 9.6 |
| 140 | 184.9 | 43.9 | 1.7 | 20.7 |
| 141 | 73.4 | 17.0 | 12.9 | 10.9 |
| 142 | 193.7 | 35.4 | 75.6 | 19.2 |
| 143 | 220.5 | 33.2 | 37.9 | 20.1 |
| 144 | 104.6 | 5.7 | 34.4 | 10.4 |
| 145 | 96.2 | 14.8 | 38.9 | 11.4 |
| 146 | 140.3 | 1.9 | 9.0 | 10.3 |
| 147 | 240.1 | 7.3 | 8.7 | 13.2 |
| 148 | 243.2 | 49.0 | 44.3 | 25.4 |
| 149 | 38.0 | 40.3 | 11.9 | 10.9 |
| 150 | 44.7 | 25.8 | 20.6 | 10.1 |
| 151 | 280.7 | 13.9 | 37.0 | 16.1 |
| 152 | 121.0 | 8.4 | 48.7 | 11.6 |
| 153 | 197.6 | 23.3 | 14.2 | 16.6 |
| 154 | 171.3 | 39.7 | 37.7 | 19.0 |
| 155 | 187.8 | 21.1 | 9.5 | 15.6 |
| 156 | 4.1 | 11.6 | 5.7 | 3.2 |
| 157 | 93.9 | 43.5 | 50.5 | 15.3 |
| 158 | 149.8 | 1.3 | 24.3 | 10.1 |
| 159 | 11.7 | 36.9 | 45.2 | 7.3 |
| 160 | 131.7 | 18.4 | 34.6 | 12.9 |
| 161 | 172.5 | 18.1 | 30.7 | 14.4 |
| 162 | 85.7 | 35.8 | 49.3 | 13.3 |
| 163 | 188.4 | 18.1 | 25.6 | 14.9 |
| 164 | 163.5 | 36.8 | 7.4 | 18.0 |
| 165 | 117.2 | 14.7 | 5.4 | 11.9 |
| 166 | 234.5 | 3.4 | 84.8 | 11.9 |
| 167 | 17.9 | 37.6 | 21.6 | 8.0 |
| 168 | 206.8 | 5.2 | 19.4 | 12.2 |
| 169 | 215.4 | 23.6 | 57.6 | 17.1 |
| 170 | 284.3 | 10.6 | 6.4 | 15.0 |
| 171 | 50.0 | 11.6 | 18.4 | 8.4 |
| 172 | 164.5 | 20.9 | 47.4 | 14.5 |
| 173 | 19.6 | 20.1 | 17.0 | 7.6 |
| 174 | 168.4 | 7.1 | 12.8 | 11.7 |
| 175 | 222.4 | 3.4 | 13.1 | 11.5 |
| 176 | 276.9 | 48.9 | 41.8 | 27.0 |
| 177 | 248.4 | 30.2 | 20.3 | 20.2 |
| 178 | 170.2 | 7.8 | 35.2 | 11.7 |
| 179 | 276.7 | 2.3 | 23.7 | 11.8 |
| 180 | 165.6 | 10.0 | 17.6 | 12.6 |
| 181 | 156.6 | 2.6 | 8.3 | 10.5 |
| 182 | 218.5 | 5.4 | 27.4 | 12.2 |
| 183 | 56.2 | 5.7 | 29.7 | 8.7 |
| 184 | 287.6 | 43.0 | 71.8 | 26.2 |
| 185 | 253.8 | 21.3 | 30.0 | 17.6 |
| 186 | 205.0 | 45.1 | 19.6 | 22.6 |
| 187 | 139.5 | 2.1 | 26.6 | 10.3 |
| 188 | 191.1 | 28.7 | 18.2 | 17.3 |
| 189 | 286.0 | 13.9 | 3.7 | 15.9 |
| 190 | 18.7 | 12.1 | 23.4 | 6.7 |
| 191 | 39.5 | 41.1 | 5.8 | 10.8 |
| 192 | 75.5 | 10.8 | 6.0 | 9.9 |
| 193 | 17.2 | 4.1 | 31.6 | 5.9 |
| 194 | 166.8 | 42.0 | 3.6 | 19.6 |
| 195 | 149.7 | 35.6 | 6.0 | 17.3 |
| 196 | 38.2 | 3.7 | 13.8 | 7.6 |
| 197 | 94.2 | 4.9 | 8.1 | 9.7 |
| 198 | 177.0 | 9.3 | 6.4 | 12.8 |
| 199 | 283.6 | 42.0 | 66.2 | 25.5 |
| 200 | 232.1 | 8.6 | 8.7 | 13.4 |
sales and TV
[1] 0.7822244
sales and radio
[1] 0.5762226
sales and newspaper
[1] 0.228299
A linear model can help us answer questions about association between response and predictors, predict sales in future, linearity of relation, and interaction between predictors.
\[ Y \approx \beta_0 + \beta_1X \]
\[\beta_0\hspace{1mm} is\hspace{1mm}population\hspace{1mm}intercept\]
\[\beta_1\hspace{1mm} is\hspace{1mm}population\hspace{1mm}slope\] Our estimates are represented as :
\[ \hat\beta_0\] \[\hat\beta_1\]
The Idea is to, essentially, draw a line through the points such that distance of every point from line is as small a possible.
One way to get estimates of population coefficients or parameters is minimizing least squares.
\[sales \approx \beta_0 + \beta_1*TV\]
\[\hat y_i = \hat\beta_0 + \hat\beta_1x_i\] \[e_i = y_i - \hat y_i\]
\[RSS = e_1^2 + e_2^2....+e_n^2\]
Least square coefficient estimates
\[ \hat\beta_1 = \frac{\sum_i^n(x_i - \bar x)(y_i - \bar y)}{\sum_i^n(x_i - \bar x)^2} \]
\[ \hat\beta_0 = \bar y - \hat\beta_1\bar x \]
Call:
lm(formula = sales ~ TV, data = advertisement)
Coefficients:
(Intercept) TV
7.03259 0.04754
“For every`additional $1000 spent on TV advertisement budget, there is additional sale of ~47.5 units”
Use the data Auto from the {ISRL2} Fit this model.
\[horsepower = \beta_0 + \beta_1*weight + \epsilon\]
find coeff estimates and residuals: \[\hat\beta_0\] and \[\hat\beta_1\]
\[Compute\hspace{1mm} standard\hspace{1mm}error\hspace{1mm} of\hspace{1mm} \hat\beta_0\hspace{1mm} and\hspace{1mm} \hat\beta_1\]
something like this:
\[Var(\hat\mu) = SE(\hat\mu) = \frac{\sigma^2}{n}\]
but in reality
\[SE(\hat\beta_0)^2 = \sigma^2[\frac{1}{n}+\frac{\bar x^2}{\sum_i^n(x_i - \bar x)^2}]\hspace{2cm}SE(\hat\beta_1)^2 = \frac{\sigma^2}{\sum_i^n(x_i - \bar x)^2}\]
What is sigma here ?
\[what\hspace{1mm} happens\hspace{1mm} when\hspace{1mm} x_i\hspace{1mm} are\hspace{1mm} spread\hspace{1mm} out\hspace{1mm} ?\]
We can use SE to to hypothesis testing. t-statistic is used to do this in practise
\[t = \frac{\hat\beta_1 - 0}{SE(\hat\beta_1)}\]
Call:
lm(formula = sales ~ TV, data = advertisement)
Residuals:
Min 1Q Median 3Q Max
-8.3860 -1.9545 -0.1913 2.0671 7.2124
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.032594 0.457843 15.36 <2e-16 ***
TV 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.259 on 198 degrees of freedom
Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16